A Collocation Method to Extract Biological Keywords and Its Application to Protein Name Recognition

نویسندگان

  • Wen-Juan Hou
  • Hsin-Hsi Chen
چکیده

Mining biological relationships from scientific text, which facilitates automatic construction of knowledge base and finding of new information, becomes one of the emerging applications. Named entity recognition is a fundamental task in relationship mining. Traditional methods listed a pre-defined set of words to indicate protein or gene interactions by intuition. The argument is that we cannot assure if the keyword set is complete. A collocation approach not only mines biological keywords for relationship establishment, but also employs the keywords to improve performance of protein name recognition. This paper proposes a collocation approach to extract biological keywords or key phrases. Frequency, mean-and-variance, and t-test statistics are computed. Compound collocates are also resolved in this study. The experiments show that the performances of the t-test model are 76.67% and 95.45% without and with considerations of text domains, respectively. This paper suggests many useful terms that the previous literatures do not touch on. The results are valuable for the investigation of relationships between proteins, and can be integrated into genome analysis systems. In the application of protein name recognition, it enhances the precision from 70.90% to 81.94% on Yapex system.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Enhancing Performance of Protein Name Recognizers Using Collocation

Named entity recognition is a fundamental task in biological relationship mining. This paper employs protein collocates extracted from a biological corpus to enhance the performance of protein name recognizers. Yapex and KeX are taken as examples. The precision of Yapex is increased from 70.90% to 81.94% at the low expense of recall rate (i.e., only decrease 2.39%) when collocates are incorpora...

متن کامل

Cloning, Expression and Purification of Truncated Chlamydia Trachomatis Outer Membrane Protein 2 (Omp2) and its Application in an ELISA Assay

Background: Although a simple and direct method does not exist for the detection of chlamydial infections, there are situations in which reliable serological tests, with sensi-tivity related to a specific antigen, could be helpful. Objective: The aim of this study was to clone the first 1100 bp of the C. trachomatis outer membrane protein 2 (omp2) gene in order to prepare a recombinant protein ...

متن کامل

The protein-nanoparticle interaction (protein corona) and its importance on the therapeutic application of nanoparticles

Nanobiotechnology has provided promising novel diagnostic and therapeutic strategies which capable to create a broad spectrum of nano-based imaging agents and medicines for human administrations. Several studies have demonstrated that the surface of nanomaterials is immediately coated with suspended proteins after contact with plasma or other biological fluids to form protein corona-nanoparticl...

متن کامل

The Tau-Collocation Method for Solving Nonlinear Integro-Differential Equations and Application of a Population Model

This paper presents a computational technique that called Tau-collocation method for the developed solution of non-linear integro-differential equations which involves a population model. To do this, the nonlinear integro-differential equations are transformed into a system of linear algebraic equations in matrix form without interpolation of non-poly-nomial terms of equations. Then, using coll...

متن کامل

Enhancing performance of protein and gene name recognizers with filtering and integration strategies

Named entity (NE) recognition is a fundamental task in biological relationship mining. This paper considers protein/gene collocates extracted from biological corpora as restrictions to enhance the precision rate of protein/gene name recognition. In addition, we integrate the results of multiple NE recognizers to improve the recall rates. Yapex and KeX, and ABGene and Idgene are taken as example...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003